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Abstract — Consider the following unequal error protection 
scenario. One special message, dubbed the "red alert" message, 
is required to have an extremely small probability of missed 
detection. The remainder of the messages must keep their average 
probability of error and probability of false alarm below a 
certain threshold. The goal then is to design a codebook that 
maximizes the error exponent of the red alert message while 
ensuring that the average probability of error and probability of 
false alarm go to zero as the blocklength goes to infinity. This 
red alert exponent has previously been characterized for discrete 
memoryless channels. This paper completely characterizes the 
optimal red alert exponent for additive white Gaussian noise 
channels with block power constraints. 



I. Introduction 

Communication networks are increasingly being taxed by 
the enormous demand for instantly available, streaming multi- 
media. Ideally, we would like to maximize the reliability and 
data rate of a system while simultaneously minimizing the 
delay. Yet, in the classical fixed blocklength setting, the reli- 
ability function of a code goes to zero as the rate approaches 
capacity even in the presence of feedback. This seems to imply 
that, close to capacity, it is impossible to keep delay low and 
reliability high. However, this lesson is partially an artifact of 
the block coding framework. The achievable tradeoff changes 
in a streaming setting where all bits do not need to be decoded 
by a fixed deadline, but rather, each individual bit must be 
recovered after a certain delay. In this setting, the reliability 
function measures how quickly the error probability on each 
bit estimate decays as a function of the delay. Surprisingly, 
the achievable error exponent can be quite large at capacity 
if a noiseless feedback link is available and cleverly exploited 

The distinguishing feature of these streaming architectures 
with feedback is the use of an ultra-reliable special codeword 
that is transmitted to notify the decoder when it is about to 
make an error While this "red alert" codeword requires a 
significant fraction of the decoding space to attain its very 
large error exponent, the remaining "standard" codewords 
merely need their error probability to vanish in the block- 
length. One question that seems intimately connected to the 
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Streaming delay-reliability tradeoff is how large the red alert 
error exponent can be made for a fixed blocklength codebook 
of a given rate. Beyond this streaming motivation, the red 
alert problem is also connected to certain sensor network 
scenarios. For example, consider a setting where sensors must 
send regular updates to the basestation using as little power 
as possible, i.e., using the standard codewords. If an anomaly 
is detected, the sensors are permitted to transmit at higher 
power in order to alert the basestation with high reliability, 
which corresponds to our red alert problem. 

Prior work has characterized the red alert exponent for 
discrete memoryless channels (DMCs) |[3|-|l5]. In this paper, 
we determine the red alert exponent for point-to-point additive 
white Gaussian noise (AWGN) channels that operate under 
block power constraints on both the regular and red alert 
messages. We derive matching upper and lower bounds on 
the red alert exponent with a focus on the resulting high- 
dimensional geometry of the decoding regions. Our code 
construction can be viewed as a generalization of that used 
in the discrete case. 

A. Related Work 

Previous studies on protecting a special message over a 
DMC have relied on some variant of the following code 
construction. First, designate the special codeword to be the 
repetition of a particular input symbol. Then, generate a fixed 
composition codebook at the desired rate. This composition is 
chosen to place the "standard" codewords as far as possible 
from the special codeword (as measured by the Kullback- 
Leibler (KL) divergence between induced output distributions) 
while still allocating each codeword a decoding region large 
enough to ensure a vanishing probability of error By construc- 
tion, the rest of the space is given to the special codeword. 
Early work by Kudryashov used this strategy to achieve very 
high error exponents in the bit error setting under an expected 
delay constraint UJ. 

In lO, Borade, Nakiboglu, and Zheng study "bit"-wise and 
"message"-wise unequal error protection (UEP) problems and 
error exponents. The red alert problem is a message-wise UEP 
problem in which one message is special and the remaining 
messages are standard. While [3^ focuses on general DMCs 
near capacity. Lemma 1 of that paper develops a general sharp 
bound on the red alert exponent for DMCs at any rate below 
capacity (both with and without feedback). Specializing to the 
exponent achieved at capacity, let X denote the input alphabet, 
{Py\x{'\x)}x£X the channel transition matrix, and Py(-) the 
capacity-achieving output distribution of the DMC. Then, the 
optimal red alert exponent at capacity is 



-Balert(C) 



maxD(p;.(-)||py|x(-|a;)) 



(1) 
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where is the KL divergence. We also mention recent 

work by Nakiboglu et al. f5\, fE\ that considers the general- 
ization where a strictly positive error exponent is required of 
the standard messages. 

For the binary symmetric channel (BSC), the optimal red 
alert exponent has a simple and illustrative form. This expo- 
nent can be inferred from the general expression in (|3] Lemma 
1] or via a direct proof due to Sahai and Draper fA\ (which 
appeared concurrently with the conference version [7] of [3l). 
Let p denote the crossover probability of the BSC and q the 
probability that a symbol in the codebook is a one. Then, the 
optimal red alert exponent as a function of rate R < C for 
the BSC is 

£^ALERT(i?) = max D{q*p\\p) (2) 

hB(q*p)-hB{p)<R 

where hsip) = — plogp — (1 — p) log(l — p), q* p ~ p{l — 
q) + q{l - p), and DiPWp) = plog (f ) + (1 - p) log (i^) . 

Csiszar studied a related problem where multiple special 
messages require higher reliability in [8 |. Upper bounds for 
multiple special messages with different priority levels were 
also developed in |3|. In |9|, Borade and Sanghavi examined 
the red alert problem from a coding theoretic perspective. As 
shown by Wang ifTOI . similar issues arise in certain sparse 
communication problems where the receiver must determine 
whether a codeword was sent or the transmitter was silent. 

The fundamental mechanism through which high red alert 
exponents are achieved is a binary hypothesis test. By design- 
ing the induced distributions at the output of the channel to be 
far apart as measured by KL divergence, we can distinguish 
whether the red alert or some standard codeword was sent. The 
test threshold is biased to minimize the probability of missed 
detection and is analyzed via an application of Stein's Lemma. 
This sort of biased hypothesis test occurs in numerous other 
communication settings with feedback, such as [1 1J-I.13J and, 
as mentioned earlier, these codes are also used as a component 
in streaming data systems (see, for instance, [T], [T], [T4], 
Us I). There is also a rich literature on the interplay between 
hypothesis testing and information theory, which we cannot 
do justice to here (see, for instance, lfT6l - lfT8l ). 

n. Problem Statement 

First, we mention some of our notational choices. We will 
use boldface lowercase letters to denote column vectors, 
to denote the all zeros vector, and 1 to denote the all ones 
vector. Throughout the paper, the log function is taken to be 
the natural logarithm and rate is measured in nats instead of 
bits. We use ||x|| to denote the Euclidean norm of the vector 

X. 

Definition 1 (Messages): The transmitter has a message 
w G {0, 1, 2, ... , A/} that it wants to convey to the receiver 
One of the messages, w = 0, is a red alert message that will 
be afforded extra error protection. We assume the red alert 
message is chosen with some probability greater than and 
the remaining messages are chosen with equal probability. 

Definition 2 (Encoder): The encoder £ maps the message 
w into a length-n real-valued codeword x for transmission 
over the channel, £ : {0, 1, 2, . . . , Af} R". Let x(w) 



denote the codeword used for message w and let C denote the 
entire codebook, C = {x(0), x(l), . . . , x(Af)}. The codebook 
must satisfy both an average block power constraint across 
codewords. 



1 *^ 



(3) 



w — l 



In addition, the red alert codeword must satisfy a less stringent 
power constraint, 

||X(0)||2 <nPaleit , (4) 



for some Paieit > ^avg- The rate of the codebook is 

R^-log M 



(5) 



nats per channel use. 

Remark 1: Note that our codebook average power con- 
straint (|3]l is less strict that the usual block power constraint 
||x(w)|p < nPavg. Our achievable scheme can be easily modi- 
fied to meet this constraint using expurgation. Furthermore, our 
red alert power constraint (@) is less strict than a peak power 
constraint |a;i(0)p < Paieit Vi, where Xi{Q) denotes the zth 
symbol of the red alert codeword. Our scheme sets the red alert 
codeword to be x(0) = — V-faieitl, which naturally satisfies a 
peak power constraint. Therefore, our main results hold under 
an average power constraint and peak power constraint as well. 

Remark 2: We omit the red alert codeword from the average 
block power constraint for the sake of simplicity. Another 
possibility would be to consider only an average block power 
constraint over both the standard and red alert codeword. This 
would lead to two different tensions between maximizing the 
red alert exponent and maximizing the rate. The first would be 
the allocation of the decoding regions and the second would 
be the allocation of power based on the probability of a red 
alert message. By using two separate power constraints, we 
can state our results in a simpler form that does not depend 
on the red alert probability. 

Definition 3 (Channel): The channel outputs the transmit- 
ted vector, corrupted by independent and identically distributed 
(i.i.d.) Gaussian noise: 



y = X 



(6) 



where z ^ A/'(0, A^I"^") for some noise variance iV > 0. 

Definition 4 (Decoder): The signal observed by the re- 
ceiver is sent into a decoder which produces an estimate w of 
the transmitted message it;, 2? : R" ^- {0, 1, 2, . . . , Af }. 

Definition 5 (Error Probability): We are concerned with 
three quantities, the probability of missed detection of the red 
alert message pmd, the probability of false alarm ppA, and the 
average probability of error of all other messages Pmsg- 



Pmd = 

P¥A = 

Pmsg = 



P(w ^ id\id = 0) (7) 

P(w = 0|w ^ 0) (8) 

1 

— Y,nw^w\w^Q,w^Q). (9) 



Definition 6 (Error Exponent): We say that a red alert ex- 
ponent of ii'ALERT(^) is achievable if for every e > and n 
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large enough, there exists a rate R encoder and a decoder such 
that 



log (pmd) > EAhEwriR) 

n 

Pfa < e 
Pmsg < e ■ 



(10) 

(11) 
(12) 



In other words, we would like the red alert codeword to have 
as large an error exponent as possible while keeping the other 
error probabilities small. The standard codewords do not need 
to have a positive error exponent. Of course, the rate must be 
lower than the AWGN capacity, R < C, where 



(13) 



A. High-Dimensional Geometry 

We now review some basic facts of high-dimensional ge- 
ometry that will be useful in our analysis. 

Let Bn (a, r) denote the n-dimensional ball centered at a G 
M" with radius r > 0. Recall that the volume of 6„(a, r) is 



Vol(6„(a,r)) - 



(14) 



r(f + 1) 

where r( ) is the gamma function lfT9l Ch. 1, Eq. 16]. We 
define 5„(a, r) to be the surface of ;8„(a, r). Its surface area 
(or, more precisely, the [n — 1) -dimensional volume of its 
surface) is 

n-l ri/2 

Vol(5„(a,r)) 



(15) 



119' Ch. 1, Eq. 19]. The dimension of the Vol( ) function will 
always be clear from the context. We also define 



7^(a,ri,r2) = {x : ri < ||x- a|| < 



(16) 



to be the spherical shell centered at a from radius ri to r2- 
The angle between two n-dimensional vectors a and b is 

a^b 



Z(a, b) — cos 



(17) 



where cos~'^(-) takes values between and vr. Let V„(a, b, 9) 
denote the n-dimensional cone with its origin at a, its center 
axis running from a to b, and of half-angle 9 which takes val- 
ues from to 7r/2. The solid angle Q,{9) of an n-dimensional 
cone of half-angle 9 is the fraction of surface area that it carves 
out of an n-dimensional sphere. 



n{9) 



Vol(V„(O,l,0)n5„(O,r) 
Vol(5„(0,r)) 



(18) 



Note that the solid angle is the same for any sphere radius 
r > 0. 

Lemma 1 (Shannon): The solid angle of a cone with half- 
angle 9 satisfies 



^l{9) = 



sm 



1 



'27rnsin0cos0 
See the math leading up to Equation 28 in 



oil 

n 



III. Main Result 

In the binary case, the simplest characterization of the 
optimal codebook is a statistical one: the red alert codeword is 
the zero vector and the remaining codewords are of a constant 
composition. From one perspective, this can be visualized 
as placing the red alert codeword in the "center" of the 
space with the other codewords encircling it (see Figure [T}. 
This corresponds to choosing the red alert codeword to be 
the all zeros (or all ones) vector The standard codewords 
are generated using the distribution that maximizes the KL 
divergence between output distributions while still supporting 
a rate R. While this two-dimensional illustration is quite useful 
for understanding the binary case, it can be misleading in the 
Gaussian case. Specifically, it suggests that we should place 
the red alert codeword at the origin which turns out to be 
suboptimal. 




for a proof. 



Fig. 1. For a BSC, we can visualize the red alert codeword (solid square) 
sitting in the "center" of the codebook and the standard codewords (solid 
circles) occupying a thin shell around it. While this illustration is generally 
sufficient for developing the intuition behind the discrete case, it does not 
capture the full story in the Gaussian case. 

Another way of looking at the binary construction is to 
visualize each fixed composition as a parallel (or circle of 
constant latitude) on a sphere (see Figure |2]l. That is, the 
code lives on the Hamming cube in n dimensions, which 
can be imagined as a sphere by taking the all zeros and all 
ones vectors as the two poles and specifying the parallels 
by their Hamming weight. From this viewpoint, the binary 
construction sets the red alert codeword to be one of the 
poles and chooses the remaining codewords on the furthest 
parallel that can support a codebook of rate R. This perspective 
leads naturally to the right construction for the Gaussian case. 
Essentially, the standard codewords are placed uniformly along 
a constant parallel. This can be achieved by generating the 
standard codewords using a capacity-achieving code with a 
fraction a of the total power. The red alert codeword is placed 
at the furthest limit of the red alert power constraint (e.g., 
at — V-faieiil) and the standard codewords are offset in the 
opposite direction (e.g., by ^(1 — Q;)Pavgl)- See Figure |3] 
for an illustration. In the high-dimensional limit, most of the 
codewords will live on a parallel, thus mimicking the binary 
construction. This scheme leads us to the optimal red alert 
exponent. 

Theorem 1: For an AWGN channel with red alert power 
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Fig. 2. From an alternate viewpoint, we can visualize the red alert codeword 
(solid square) sitting on the pole of a sphere and the standard codewords (solid 
circles) on a parallel on the opposite hemisphere. Our code construction for 
the Gaussian case is inspired by this picture. If the red alert power constraint 
is larger than the average power constraint, the red alert codeword should be 
placed on the sphere's axis but off its the surface, directly above the pole. 

constraint Paieit, average power constraint P^vg, and rate R, 
the optimal red alert exponent is 

, ^ Palen + Pavg + 2VPato(Pavg + iV(l - e^^<)) 
^ ' 2N 

We prove achievability in Lemma Q and provide a matching 
upper bound in Lemma [TT] 

In the conference version of this paper [,21,1 . we used a 
different code construction that lead to a smaller achievable 
red alert exponent. The codewords were generated uniformly 
on the sphere of radius -^nPavg and we only kept those that 
fell within a cone of appropriate half-angle. This type of 
construction turns out not to achieve as dense a packing as 
the construction used in this paper In Appendix iP] we explore 
the reasons why this occurs in the binary case. In Appendix 
IE] we state the achievable red alert exponent for the conical 
construction. 




Fig. 3. Red alert codebook construction. The red alert codeword (solid 
square) is placed at — V-fiilcrt 1 which takes it a distance V'^-falcrt from the 
origin (circle). The standard codewords (shaded region) are drawn i.i.d. ac- 
cording to a Gaussian distribution with variance aPavg — A. These codewords 
are pushed away from the origin by an offset — a)Pavgl (dashed line). 

IV. Codebook Construction 

Our codebook construction for C consists of the following 
steps: 

1) Choose e > so that R < C - e. 

2) The red alert codeword is placed at the boundary of the 
red alert power constraint, x(0) = — \/Paieii 1- 



3) Choose < a < 1 so that 

i? + ^=^log(l + ^) (19) 
and choose A > so that 

4) Draw e"^ codewords v(l), . . . , v(e"^) i.i.d. according 
to a Gaussian distribution with mean zero and variance 

5) To each of these codewords, add an offset 
^(1 — Q!)i-'avg 1 SO that the transmitted codeword 
for each message (other than w — 0) is 
x{w) = V(l - a)f'avg 1 + vH. 

We will show that this procedure yields a random codebook 
C whose false alarm probability and average probability of 
error are both less than e. Afterwards, we will characterize 
the probability of missed detection for the red alert codeword. 
This will in turn imply the existence of a good fixed codebook. 

V. Achievability 

In this section, we will show that the red alert error exponent 
stated in Theorem [T] is achievable. We begin by stating useful 
large deviations bounds that will play a role in both the proof 
of the achievability and of the converse. Next, we show that 
any standard codeword plus noise lies at a certain distance 
from the red alert codeword with high probability. Afterwards, 
we argue that, with high probability, any standard codeword 
plus noise is contained in a cone of a certain half-angle that is 
centered on the red alert codeword. By combining the distance 
and angle bounds, we can constrain the decoding region for the 
standard codewords to the intersection of a cone with a shell. 
The remainder of R" can thus be allocated to the decoding 
region for the red alert codeword, for which we will bound 
the resulting probability of a missed detection. 

A. Large Deviations Bounds 

Our upper and lower bounds on the probability of error 
are proven by deriving bounds on the size and shape of the 
decoding regions and then applying Cramer's Theorem to 
get large deviations bounds. Define gx{o) to be the moment 
generating function of a random variable X, 

gx(a)=E[e'^^] , (21) 

and Ix (b) to be the Fenchel-Legendre transform ^22! Defini- 
tion 2.2.2] of log(.gx(-)). 

Ixib)=snp[ab-\og{gx{a))] . (22) 

a 

Theorem 2 (Cramer): Let Sn — ^Tli-^i the normal- 
ized sum of n i.i.d. variables Xi , . . . , Xn with finite mean and 
rate function Ix{b)- Then, for every closed subset C M, 

¥{Sn G J") < 2 cxp ( - n inf Ixib)) , (23) 
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and, for every open subset Q C 
1 



liminf - logP(5„ eg)>- inf Ixib) 

n-»-oo n b£G 



(24) 



See, for instance, 0221 Theorem 2.2.3] for a proof. We will be 
particularly interested in how this bound applies to the length 
of i.i.d. Gaussian vectors, which corresponds to setting the 
Xi to be Chi-square random variables (with one degree of 
freedom). The moment generating function for such random 
variables is gx{o) = ^i_2a which yields a rate function of 
Ix{h) = \{h-l~\ogb). " 

B. Distance Bounds 

The following lemma formalizes the notion that the squared 
^2-norm of an i.i.d. Gaussian vector concentrates sharply 
around its variance. Thus, for large n, the decoding region 
can be restricted to a thin spherical shell. 

Lemma 2: Let z be a length-n vector with i.i.d. zero-mean 
Gaussian entries of variance N . Then, for any /3 > 0, 

P(||z||2 > nN{l + (3)) < 2cxp - log(l + /?))) 

and, for any < /3 < 1, 

P(||zf < nN{l - /?)) < 2exp (-^( - /? - log(l - . 

See Appendix lAl for the proof. 

Recall that the Q-function returns the probability that a 
scalar Gaussian random variable with mean zero and unit 
variance is greater than or equal to i > 0, 



Q{t) 



1 



/27r 



exp 



dx 



(25) 



and is upper bounded as 

Q{t) < 



1 



■ exp 



"(|a^z| 



2 V 2 

The next lemma is about the well-known fact that an 
i.i.d. Gaussian vector is approximately orthogonal to any fixed 
vector 

Lemma 3: Let z be a length-n vector with i.i.d. zero-mean 
Gaussian entries with variance N and let a be a length-n 
vector with ||a|p — na for some fixed a > 0. Then, for any 
5 > and n large enough, 

>S\\a\\')<S. (26) 

See Appendix lAl for the proof. 

In Figure m the codebook is illustrated from the perspective 
of the origin. Using the above lemma, it can be shown that 
all but a vanishing fraction of codewords have power close 
to Pavg and are nearly orthogonal with respect to any fixed 
vector. We now characterize how far away a codeword plus 
noise is from the red alert codeword with high probability. 

Lemma 4: For any S > and n large enough, the distance 
from the red alert codeword to the codeword for a standard 
message, w £ {1, 2, . . . , e"^}, plus noise is at least L with 
high probability, 

P(|| -x(0) +x(w;) +z|| > L) > 1 - 5 (27) 




Fig. 4. From the perspective of tlie origin, most of the codewords ai'e 
concentrated in an e-shell of power aPavg — A that is offset away from the 
origin with power (1 — Q)Pavg- Thus, with high probability, any random 
codeword meets the power constraint. 



See Appendix lAl for the proof. 



C. Angle Bounds 

We now upper bound the n-dimensional angle between a 
fixed vector and the same vector plus i.i.d. Gaussian noise. 

Lemma 5: Let z be a length-n vector with i.i.d. zero-mean 
Gaussian entries with variance N and let a be a length-rt 
vector with ||a|p = na for some fixed a > 0. For any S > 
and n large enough, the probability that the angle between a 
and a + z exceeds cos^^ (y^ a+jv ) + is upper bounded by 



Z(a, a + z) > cos 



a + N 



S] <5 . (28) 



L - Wn Paler, + Pavg + N + 2J P,x,n{l - a)P, 



X-6 



See Appendix |B] for the proof. 

In Figure |5] we have depicted the distance L and the angle 
from the red alert codeword to a standard codeword plus noise . 
Notice that both the noise and the codewords are (nearly) 
orthogonal to the axis along which the red alert codeword 
lies. 

Now consider a cone centered on the red alert codeword that 
contains a standard codeword plus noise with high probability. 
The next lemma upper bounds the required half-angle for the 
cone. 

Lemma 6: Let V„(x(0), 0, denote the cone centered on 
the red alert codeword with axis running towards the origin 
and half-angle ip. For any S > Q, w E {1,2, . . . , e"^}, and n 
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Fig. 5. With liigli probability, a Gaussian codeword, a Gaussian noise vector, 
and the red alert codeword vector are all nearly orthogonal to each other 
Conditioning on this event, we can derive the minimum distance L and the 
angle -0 between a codeword plus noise and the red alert codeword. 



large enough, if the half-angle ?/; is greater than or equal to 



2V^deit(l-a)i'avg + iV- A 

then the cone contains the codeword for message w plus noise 
with high probability, i.e., 

P(x(w) +z e V„(x(0),O,?A)) >l-,5 . (29) 

See Appendix |B] for the proof. 

D. Red Alert Exponent 

Now that we know the decoding region can be confined to a 
conical shell, we can bound the probability of missed detection 
for the red alert codeword. 

Lemma 7: For any rate i?, the following red alert exponent 
is achievable 



/'alert + ^avg + 2 VPaleit(Pavg + iV(l - e2«)) 



2N 



R 



Proof: Choose (5 > 0. In Lemma HI L is a lower bound 
on the distance between the red alert codeword and a standard 
codeword plus noise. From Lemma |6] we have an upper bound 
on the half-angle needed to capture a standard codeword plus 
noise in the cone V„(x(0), 0, t/") centered on the red alert 
codeword. If the received vector lies in the cone and is at 
least distance L from the red alert codeword, then the decoder 
assumes the red alert message was not transmitted. Otherwise, 
it declares that the red alert message was sent. For n large 
enough, we know that the probability that a random codeword 
plus noise, x(w) + z, leaves this region is at most e. Therefore, 
the probability of false alarm (averaged over the randomness 
in the codebook) is upper bounded by e. 



If the received vector falls in the decoding region for stan- 
dard messages, we simply subtract the offset ^^(1 ^ a)^avgl 
and apply a maximum likelihood decoder to make an estimate 
of the transmitted message. Since the rate of the codebook 
is chosen to be slightly less than the capacity (for the power 
level aPavg — A), it is straightforward to show that the average 
probability of error for a given message is at most e. 

Since the average false alarm probability and average error 
probability are small, it follows that there exists at least 
one fixed codebook with a small false alarm probability and 
average error probability. We now turn to upper bounding the 
probability of missed detection. Assume the red alert codeword 
is transmitted. Define 



/'alert + ^avg + 2^Palert(l - a)f'avg - A - (5 



N 



(30) 



where A is specified by step 3) of the codebook construction 
in Section |IV] Using Lemma |2l the probability that the noise 
pushes the red alert codeword further than L (as specified in 
Lemma |4|i can be upper bounded by 

P(|lx(0) + z|| > i) = P(||x(0) + z||2 > nN{l + P)) (31) 
<exp(-|(/3-log(l + /?))) . (32) 

The probability that the received vector falls into the cone 
of half-angle t/j is given by the fraction of surface area of a 
sphere carved out by the cone. Using Lemma [T] this can be 
calculated as 



x(0)+ze V„(x(0),O,V) 
sin" V' 



l + O 

2-1171 sin \j} cos V V " 



(33) 



Pulling terms into the exponent we get 

exp ^ ^ ~ I'^S ( sin "0) H — log {VOjrn sin ip cos V') 
+ 0{l/n) 



(34) 



For n large enough, we get that the probability is upper 
bounded by exp ^ — n( — log(sin V') — (5)^ 

Since the noise is an i.i.d. Gaussian vector, its magnitude 
and direction are independent. Therefore, the probability of 
missed detection is upper bounded as 



=P(||x(0)+z|| >L) P(^x(0)+ze V„(x(0),O,^) 
< exp (- ^ (;9 - log(l + /3) - 2 log(sin V') - 25 



for n large enough. For A and 5 small enough and n large 
enough, the exponent f — | log(l + (3) — log(sin^) — 5 can 
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be made equal to 



-falert ~t~ -^ava ^ 

2N 



1 / P.l,n + Pavg + 2^Paleit(l - a)Pavg + iV 

2 °^ i iV 



1 / Palert + ^avg + 2VPaleit(l - a)Pavg + N 

2 °^ 1^ aPavg + 

-falert + -Pavg + 2y'PaIert(l ~ Ct)-favg 



2iV 



Palert + Pavg + 2VPaleit(l - a)PaN 

2N 



-P-e . 



Finally, we can solve for a in terms of R to get a = 
(e^^ — 1). Substituting this into the expression above yields 
the desired result. ■ 
Note that at R 



2V^aler.(Pavg+iV(l-e2«)) 



= 0, the coherent gain 

2^PalenPavg, which is 

the largest benefit we could hope for At P = C, the coherent 
gain vanishes. 

Remark 3: We can interpret our achievability result from a 
hypothesis testing perspective. Let Hq denote the event that 
a standard codeword is transmitted and let Hi denote the 
event that the red alert codeword is transmitted. Under Hq, the 
entries of y are i.i.d. according to a Gaussian distribution with 

N. Under "Hi, the 
"VPaiert and variance 
Theorem 11.8.3], 



mean ^y{l^^a)P^ and variance aP^ 



avg 



entries are i.i.d. Gaussian with mean 
N. Using the Chernoff-Stein Lemma 
we can bound the missed detection probability of the optimal 
hypothesis test via the KL divergence between the two distri- 
butions, D{j\f{y/il-a)P,,g, aF,,g + N)\\j\fi-^/P^, N)). 
A bit of calculation will reveal that this KL divergence 
corresponds exactly to the red alert exponent. One can obtain 
the same exponent by plugging these distributions into the red 
alert exponent expression from [T Lemma 1]. However, this 
does not in itself constitute a proof as the results of |i3J are 
for DMCs without cost constraints. 

VI. Converse 

We now develop an upper bound on the red alert exponent. 
Our bound relies on the fact that, in order to recover the stan- 
dard messages reliably, we must allocate a significant volume 
of the output space for decoding them, which contributes to 
the probability of missed detection. An overview of the main 
steps in the proof is provided below. 

• In Lemma [8] we argue that a constant fraction of the 
codewords live in a thin shell and strictly satisfy the 
power and error constraints. 

• With high probability, the standard codewords plus noise 
are concentrated in a thin shell. Lemma |9] establishes this 
fact as well as the minimum volume required for the 
decoding region to attain a given probability of error 

• To minimize the probability of missed detection, we 
should pack this volume into the thin shell to maximize 



the distance from the red alert codeword (see Figure |6]for 
an illustration). Lemma [TO]bounds the distance and angle 
from the red alert codeword to the resulting decoding 
region (see Figure |7] for an illustration). 
• Finally, in Lemma (TT] we bound the probability that the 
noise carries the red alert codeword into the decoding 
region for the standard codewords. 
Lemma 8: Assume that a sequence of codebooks satisfies 
the average block power constraint P^vg and has average 
probability of error pmsg that tends to zero. Then for any 
7 > and n large enough, there exists a shell of width 7 that 
contains e"*^^"'''-' codewords, each with probability of error at 
most (2/7)pmsg> and average power at most Pavg(l — 7)^^- 
See Appendix |C] for the proof. 

Lemma 9: Assume that, for some 7, p > 0, e"^^^'*'^ 
codewords, each with probability of error at most 
(2/7)pmsg lie in the shell Tn{0, y/np, ^/np + 7)- Then, 
for n large enough, the decoding region for these 
codewords must include a subset of the noise-inflated 
shell 7^(0, v^n^ 
at least 



Vm 



— 7), -y/ n{p + N + 7)) with volume 
(n7r(A^-7))"/2 



r(f + 1) 



See Appendix |C] for the proof. 

Lemma 10: Assume that a sequence of codebooks has rate 
P and an average probability of error pmsg that tends to zero as 
n increases. Then, for sufficiently small e and n large enough, 
the probability of missed detection pmd is lower bounded by 
the probability that the noise vector has squared norm ||z|p 
between + ne and + 2ne and lies at an angle /(z, 1) 
between ■0(1 — e) and -0(1 — 2e) where 



P^ = n P 



aleit 



Pa^ 



N + 2^Pa,e„(Pavg + A^(l - e^^))) 



2R 



Paler, + Pavg +N + 2^ P,^,niP..g + N {1 ~ 6^'^)) 



Proof: Consider the standard codewords from a red 
alert codebook. From Lemma |8] for any 7 > and 
n large enough, at least e"'^^^'''' codewords with power 
at most Pavg(l — 7)^^ and probability of error at most 
(2/7)pmsg must lie in a shell Tn{0, y/np, ^ynp + 7) for 
some p > 0. From Lemma |9] it follows that the decoding 
region for these codewords falls within the noise-inflated shell 
7^1(0, \/n{p + N — -f), y^n{p + N + 7)) and has volume at 
least \4iiN- 

To get our lower bound, we need to pack this volume in 
the noise-inflated shell such that it minimizes pmd- Since the 
noise vector is i.i.d. Gaussian, the probability that the red alert 
codeword is pushed to a certain point is determined solely by 
a decreasing function of the distance. Let Qij denote the set of 
all points at distance d or greater from the red alert codeword 



gd = {z: ||z-x(0)|| >d} . 



(35) 



The optimal volume packing corresponds to the intersection 
of the set and the noise shell 

Tn{0, \/ n{p + N — v), \/n{p + N + v)^ with d chosen such 
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Red Alert 




Noise-Inflated Shell 



Fig. 6. To attain tlie desired probability of error, the decoding region for 
the standard codewords must include a subset of the noise-inflated shell with 
volume at least Vmin. To minimize the probability of missed detection, we 
place this volume as far from the red alert codeword (square) as possible. 
Let 5d denote the set of points at distance d or greater from the red alert 
codeword. The decoding region 1i is the intersection of 5d and the shell, 
where d is chosen to capture volume Vmin- 




Fig. 7. Illustration of successive lower bounds on the probability of missed 
detection, puD- The red alert codeword is represented by a square and the 
origin by a circle. The decoding region TZ is denoted by a thick line. We 
would like to characterize the distance L* and angle ip* to the edge of TZ, 
represented by the point v*. To do so, we consider a cone with half-angle 
8 (shaded region) with the same volume as TZ. The intersection of this cone 
with the outer surface of TZ contains a point v at distance L > L* and 
angle ip < tp*. In our final lower bound, we only consider the event that 
the received vector lies in the subset of TZ at distance slightly larger than L 
and within an angle between ?/) — e to i/" — 2e from the red alert codeword 
(illustrated by dark patches). 



that the volume of the set is equal to Vmin- Let TZ denote the 
resulting region and see Figure |6] for an illustration. 

Let 7?.EDGE denote the set of points in TZ that sit at the 
minimum distance to the red alert codeword, 



TZ 



EDGE 



e TZ 



x(0)|| 



mm 



x(0)||} , 



and let v* G 7?.edge be any of these points. Let L* and V'* 
denote the distance and angle from the red alert codeword 
x(0) to V*. We now seek to bound these quantities through a 
bound on the angle from the origin to v* . 

Let 9* denote the half-angle of a cone, centered on the 
origin that contains the region TZ (and thus includes v*). The 
volume of this cone must be at least equal to that of TZ since TZ 
is a subset of the noise shell. Therefore, 9* is lower bounded 
by the half-angle 6* of a cone whose volume is equal to the 
volume of TZ (see Figure|7]for an illustration). Combining ([14) 
and Lemma [T] for n large enough, the volume of this cone is 
upper bounded by 



{mr{p + N + sin^ e)"/^ 



r(5 + i) 



(36) 



Now, since we require this quantity to exceed 14iin. we can 
lower bound 9 by 



I N -V 
p + N + iy 



(37) 



We can further lower bound 9 by setting p to its maximum 
value Pavg(l - 7)"^ 



> sin^i e^-", 



N -V 



(38) 



Thus, for any 5 > v and 7 small enough, and n large 
enough, 9* is lower bounded as follows 



> 9> si-oT^ 



N 



Pavg + N 



(39) 



The distance L* from x(0) to v* is upper bounded by the 
distance i to a point v that lies on the intersection of the 
outer shell (at distance ^n(Pavg(l — 7)""^ + N + 1^) from the 
origin) and the cone of half-angle 9. Without loss of generality, 
assume that the red alert codeword is placed at x(0) = — /il 
for some p > oQ The direction of the red alert codeword is 
not important since we will always fill the noise shell relative 
to this direction. Then is at least 

(yr^I + cos 6iy'n(Pavg(l - 7)"^ +N + i^) 



+ (^sm9^n{P,,g{l - -f)-^ + N + ly)^ 



(40) 



For any 5 > 0, 7 and ly small enough, and n large enough, 
this quantity is itself upper bounded by 

n (^fi + Pavg + + 2 cos 9 ^ p{P^^g + N) + 5^ . 

The half-angle ip of a cone, centered on the red alert 
codeword, that contains the point v is lower bounded by 

sin 6'^n(Pavg(l - 7)"^ + N + ly) 



{p + Pavg + N + 2cos9y/p{P,,^ + N) + 5) 

'it is straightforward to prove that placing the red alert codeword exactly 
at the oiigin is suboptimal. 
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which, for any > 0, 7, v, and 5 small enough, and n large 
enough is itself lower bounded by 



sm 



[l-ri) sm0^. 



■N 



, + favg + + 2 COS 6l^/x(Pavg + , 



The probability of missed detection decreases if the distance 
L* from x(0) to v* is increased. The angle ip* will simul- 
taneously decrease. Thus, by setting fi = Paiert. we further 
lower bound the probability of missed detection. Using the 
relation si n^ 6 + cos^ = 1 c ombined with (|39] l, we find that 
cos 9 < ^1 — e^"'*~^'^]Yq^— . Plugging in /i and 9, we obtain 
the following upper bound on (L*)'^: 

n(Palert + Pavg +N + 2^ PM,n{P..g + N {1 - e^^-^^)) + ^) 

and the following lower bound on 

I (1 - „)2Are2fl-25 
sm J ^ / 

V -Palert + Pavg +N + 2 y^Palert (Pavg + ^(1 " e2«-2* ) ) 

Finally, it follows that, for e small enough (but greater than 
for finite n) and n large enough, the optimal packing contains 
all points from squared distance + ne to + 2ne from 
the red alert codeword and angle — e) to — 2e) where 
L and ifj are as in the statement of the theorem. Thus, the 
probability of missed detection is lower bounded by the event 
that the noise falls into this region. ■ 

Lemma 11: For any rate R, the red alert exponent is upper 
bounded by 



^ Palert + Pavg + 2 VPato(Pavg + N [l - e^^)) 

E{R) < ^ R . 

Proof: Lemma [TO] established that the probability of 
missed detection is lower bounded by the event that the noise 
has squared length between + ne to + 2ne and angle 
between -0(1 — e) and — 2e) for some e that tends to 
as n tends to infinity. We now lower bound the probability of 
this event. Define 



Pato + Pavg + 2VPaleit(Pavg + N {1 - e^")) 

^ N • ( ) 

Since the magnitude and angle of an i.i.d. Gaussian vector 
are independent, the probability of missed detection is lower 
bounded as follows: 



Pmd 



> 



(L^ + ne < llzll^ < 



2ne) 



■ P(ze {V„(0,l,V(l-e))\V„(0,l,V(l-2e))}) 

By Lemma [T] for n large enough, the second term in the 
product can be lower bounded by 

(sin(V.(l - e)'))" - (sin(V(l - 2e)))" (42) 

which, for n large enough, is itself lower bounded by 

((l-e)3sin(V'(l-e)))" (43) 

= exp(-n(-log((l--e)3sin(V'(l-e)))) • (44) 



Now, substituting in the lower bound on ip from Lemma [TO] 
we arrive at the following lower bound 

exp(-n(-P+ilog(l + /?)-31og(l-e))) . (45) 

Using the upper bound on L from Lemma [TO] and applying 
Theorem |2] for Chi-square random variables (and noting that 
e and v go to zero as n goes to infinity), it follows that 



liminf f - ^ logP (i^ + ne < ||zf < P^ + 2ne) 

n-i-oo y n 

> f -^log(l + /?) . 



(46) 
(47) 



Combining this with the lower bound on the angle event in 
(|45] |. the exponent of the probability of missed detection is 
lower bounded by 



lim inf 

n— ^oo 



1 



■ log Pmd 



(48) 



> ^ - i log(l + /3) + ^ log(l + I3)-R (49) 



Palert + Pavg + 2 ^Palert (Pavg + ^(1 - e^^)) 



as desired. 



2N 



VII. Plots 
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Fig. 8. Optimal red alert exponent for Pav 



-5dB with Pale, 



Pavg, 2 Pavg, 3 Pavg. An Upper bound on the point-to-point AWGN eiTor 
exponent is provided for comparison. 

In Figures |8] and |9] we have plotted the optimal red alert 
exponent for P,vg = — 5dB and Pavg = 15dB, respectively, 
with red alert power constraints P,ieit = Pavg, 2Pavg, and 3P,vg. 
For comparison, we have also plotted an upper bound on the 
AWGN point-to-point error exponent from ||20] Equation 4]. 
Notice that the red alert exponent can be quite large at capacity, 
even when the red alert power constraint is equal to the average 
power constraint. 
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Fig. 



Optimal red alert exponent for Pav 



15dB with Paicrt 



, 3Pavg . An upper bound on the point-to-point AWGN eiTor 



exponent is provided for comparison. 



VIII. Conclusions 

We have developed sharp bounds on the error exponent for 
distinguishing a single special message from 2"^ standard 
messages over an AWGN channel. As discussed in the intro- 
duction, these bounds can be used to characterize the perfor- 
mance of certain data streaming architectures, where each bit 
must be decoded after a given delay. An interesting question 
for future study is how well a single special message can be 
protected at a given finite blocklength, i.e., understanding the 
Umits of unequal error protection in the non-asymptotic regime 

Appendix A 
Proofs for SECTiON rV-BI 

Proof of Lemma |2} The squared Euclidean distance is 
the sum of n i.i.d. squared Gaussian random variables with 
variance N. Therefore, ^||z|p is the sum of n i.i.d. Chi- 
square random variables. Applying Theorem |2] and plugging 
in the Chi-square rate function of Iz{b) = — 1 — log 5), it 
follows that 



P(|| 



> nNb) < 2 cxp ^ (6 - 1 - log b)^ . (50) 
-I- (3 yields the first bound and b = \ ~ (3 



Substituting in 6 = 
yields the second. ■ 
Proof of Lemma |5} First, we write the probability that 



|a z| is greater than t in terms of the Q-function, 

t 



la^zl 



>t) 



2Q 



iV||a| 



Substituting t 

P(| 



< exp 
(5||a|p yields. 



2N\\a.V 



a^zl 



> 



(5||a||2) < exp 



2N 



(51) 



(52) 



(53) 



which can be driven arbitrarily close to zero for n large 
enough. ■ 
Proof of Lemma H} We simply wish to bound the 
length of the vector from the special codeword to a standard 
codeword plus noise, — x(0) +x(i(;) +z. By expanding terms, 
we obtain: 



x(0) +x(u;) +z|| 



+ V (1 ~ ")^avg j 1 + VH + 

(^/^+^(l-a)Pavg^ 1^1 

f 2 ( + A/(l-a)Pavg ) l^(v(u;) + z 



(54) 
(55) 



v(w;) + z| 



(56) 



The first term is n(Paieit + (1 - a)^'avg + 2^Paiert(l - a)-Pavg- 
The second term is the inner product of a fixed vector 
2 (V-faieit + ■\/(l — a)Pavg) 1 and an i.i.d. Gaussian vector 
v(ti;) + z since v(u') is an element of a random Gaussian 
codebook. Thus, using Lemma |3] it can be shown that the 
probability that this inner product is less than -~n5/2 is at 
most (5/2 for n large enough. The third term is the squared 
norm of an i.i.d. Gaussian vector with mean zero and variance 
aP^^^-\- N — \. From Lemma|2] it follows that ||v(w) + z||^ is 
less than n{aPavg + N — \ — 5 /2) with probability at most 5/2 
for n large enough. Combining these three bounds completes 
the proof. ■ 

Appendix B 
Proofs for SECTioN fy-CI 

Proof of Lemma |5} The angle between a and a + z is 

a-^(a + z 



Z(a, a + z) 



(57) 



From Lemma [3] for any v > and n large enough, the 
probability that a^z > J^||a|p is at most v. Therefore, since 
||a|| — na we have that a-^(a + z) > (1 + v)na with 
probability at most v. Combining Lemmas|2]and[3] we can also 
show that the probability that ||a + z|| < (1 — v)yjn[a + N) 
is at most v for n large enough. Thus, the probability that 



< 



(1 + v)7ia 



a^(a + z) 

||a||||a + z|| " y/na{l - v)^/n{a + N) 



1 



N 



(58) 



(59) 



is at most 2v. Choosing v small enough yields the desired 
result. ■ 
Proof of Lemma |6} The angle between the axis of the 
cone and the standard codeword plus noise is 



Z(-x(0), -x(0) + x(w) + z) = cos 
(-x(0))^(-x(0)+xHh 



) (60) 

" ||x(0)||||-x(0)+xH+z|i • ^^^^ 

Since cos~^(w) is a decreasing function of u, an upper bound 
on the angle can be obtained by lower bounding u. We will do 
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this by lower bounding the numerator and upper bounding the 
denominator (with high probabiHty). Expanding the numerator 
yields: 

(-x(0))^(-x(0) +x(w) +z) (62) 



aleit 



(64) 



The first term is simply nVPaiert(\/^aieit + - a)Pavg)- 
The second term is the inner product of a fixed vector and an 
i.i.d. Gaussian vector. Thus, using Lemma |3] it can be shown 
that for any ly > Q and n large enough, the probability that the 
second term is less than —lyn is at most The denominator is 
composed of two terms. The first is simply ||x(0)|| = ^/nPaiert- 
Following the proof of Lemma |4] it can be shown that the 
second term || — x(0) + :x.{w) + z\\ is greater than 




alert 



Pavg+iV + 2./Palert(l-a)Pa^ 



with probability at most v. Combining these bounds, we get 
that the probability that u is less than 



from Lemma |2] that for any 7 > and n large enough, 
the probability that ||y|| is larger than y/n{p + N + or 
smaller than -y/ n{p + N — is upper bounded by pmsg ■ If 
the noise lands outside this "noise shell," then we will assume 
that the transmitted codeword is decoded correctly. However, 
(1 — a)Pav ) (63) ^^'''^ codeword still needs to capture 1 — pmsct^ probability 

inside the shell to ensure the error probability does not exceed 
Pmsg-- 

Now, consider the volume required for decoding a single 
codeword reliably. Since the noise is i.i.d. Gaussian, its 
probability distribution is rotationally invariant. This implies 
that the shape that uses the least volume to capture a given 
probability of error is a sphere centered on the codeword. Let 
y/m' be the radius of this sphere. By Lemma |2] if v < N, the 
probability that the noise falls inside this sphere goes to zero 
exponentially in n which implies the probability of error goes 
to one. Therefore, for n large enough, the probability of error 
will always exceed the desired probability of error (which is 
assumed to be bounded away from one). Using ( fT4l i. we get 
that the decoding region of each codeword must have volume 



X + v 



at least 



need a volume of at least 



for any 7 > 0. We find that we will 



V-Palert + v/(l - a)-Pavg - 



(65) 



Palert + Pavg + N + 2^Palert(l - a)Pavg + V 



n/2 



r(t + i) 



(66) 



with probability at most 2v. Thus, so long as the half-angle 
■0 is greater than or equal to 



\ \JP^\,n + Pavg + 2^Palert(l-a)Pavg +N -\/ 

the cone contains x(u') +z with probability at least 1 — 5 for n 
large enough. Applying the trigonometric identity siT?[tjj) + 
cos^('!/') = 1 completes the proof. ■ 

Appendix C 
Proofs for SectionIvTI 

Proof of Lemma H)- Observe that at least one codeword 
has power at most Pavg, otherwise the average will be larger 
than Pavg. If we remove this codeword's contribution from 
the average, the remaining codewords have average power 



at most P 



avg e"«-l' 



Now, we can find a codeword whose 

nR 

power must be at most Pavg f.nH_i - Removing this codeword 
yields an average of Pavg JiH_2 ■ Continuing this process, 
we can remove 7e"^ codewords that each have power at 
most Pavg(l ~ 7)^^- By the same argument, we can find 
(1 — (7/2))e"^ codewords that each have probability of error 
at most (2/7)pmsg- Therefore, at least {j/2)e"^' codewords 
must satisfy both these constraints simultaneously. 

The selected codewords live in the sphere of radius 
•\/nPavg(l — 7)"^. We partition this sphere into shells of 
width 7 each. It follows that at least one of these shells 
must contain ""^ 



to reliably decode these codewords. ■ 

Appendix D 
Offset Codes Versus Conical Codes 

We now develop some intuition for why the offset construc- 
tion of Section |IV] is a better construction than the conical 
construction we used in our earlier work [21]. The difference 
between these two constructions is easier to understand in a 
discrete setting so we will analyze the corresponding con- 
structions for a BSC with crossover probability p. For ease of 
analysis, we will calculate rate in bits per channel use (rather 
than nats per channel use). 

First, recall that the BSC red alert exponent can be attained 
using a fixed composition codebook. Specifically, each of the 
2"^ codewords is drawn independently and uniformly from 
the set of weight-ng binary sequences. If the rate is less than 
the induced mutual information, the average probability of 
error can be driven to zero 



R<I{X;Y)=hB{q*p)-hB{p) 



(67) 



2^«(P.„„(l-7)-i) 

n large enough so that e"*^^~^) < 



e""" codewords. Finally, select 

7^ ^ni?, _ 



2Vn(P.v8(l-7)-i) 

Proof of Lemma [9J Assume that one of the codewords 
from the shell 7^(0, ^/np, ^/np + j) is transmitted. It follows 



The red alert codeword is taken to be all the all zeros vector 
The decoder runs a hypothesis test between the two possible 
output distributions, Bernoulli (p) and Bernoulli (q * p). The 
error exponent for the probability of missed detection is the 
KL divergence between the two distributions, D{q *p\\p). As 
shown in |4|, this is the optimal red alert exponent. 

We can construct a conical code of parameter q > 1/2 
by first drawing 2"'^*-^^'^ codewords i.i.d. according to a 
Bemoulli(l/2) distribution for some e > 0. Let C denote 
the resulting set of codewords. To guarantee the same red 
alert exponent, we only keep those codewords with Hamming 
weight nq or greater and set the red alert codeword to be the 
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all zeros vector. We now bound the rate of this construction. 
Using Theorem |2] it can be shown that the probability that a 
Bernoulli (1/2) sequence x has Hamming weight at least nq 
is upper bounded as 

P(wt(x) > nq) < 2-"-D(9llO-5) . (68) 

Take Cq to be the set of subset of codewords in C with 
Hamming weight nq or greater Using (l6Ft . the expected size 
of Cq is upper bounded by 

_ 2nil-hBip)-il-hB{q))-e) 

It can be shown with a Chernoff bound that the probability 
Cq contains significantly more codewords vanishes doubly 
exponentially in n. Furthermore, it can be shown that the 
average probability of error of Cq vanishes with n. Therefore, 
the rate of the conical codebook is hsiq) — hsip) for a red 
alert exponent of D{q *p\\p)- 

Now, observe that q < q * p < 1/2 (unless either q or p 
is equal to 1/2) so hsiq) < hsiq *p), meaning that the rate 
of the conical construction is strictly less than the (optimal) 
constant composition construction. Intuitively, this means that 
the usual i.i.d. Bemoulli(i) construction used to approach 
capacity does not pack codewords of higher (or lower) weights 
efficiently. Constraining the weight of codewords is essential 
to the hypothesis test that leads to the red alert exponent. 
The constant composition (or offset) construction is successful 
since it optimizes the packing of codewords of a given weight. 
A similar phenomenon occurs in the AWGN setting as shown 
below. 

Appendix E 
AWGN Conical Codes: Red Alert Exponent 

For completeness, we review the AWGN conical code that 
we proposed in (21) and the resulting red alert exponent. The 
construction is comprised of three main steps: 

1) Place the red alert codeword at the limit of the red alert 
power constraint, x(0) = — V-Paiertl- 

2) Draw 2"^'^^'^) codewords i.i.d. according to a Gaussian 
distribution with mean and variance Pavg — £■ 

3) Of these codewords, only keep the first 2"^ that lie in 
the cone V„(0,l,6' + e) where = sin"^ (e-^*^"-^)). 
(If there are fewer than 2"^ such codewords, declare an 
error) 

It can be shown that with high probability the resulting 
codebook contains 2"^ codewords inside the cone of half- 
angle 9. We now turn to bounding the distance and angle from 
the standard decoding region to the red alert codeword. 

The distance can be bounded using the techniques used to 
prove Lemma |4] It follows that for any i5 > and n large 
enough, the squared distance from the red alert codeword 
to a standard codeword plus noise is at least L with high 
probability, 

P(||-x(0)+xH+z|l >L) >l-5 (72) 

= n{P,kn + Pavg + 2v/Pder,PavgC0S 9 + N - S) . (73) 



Substituting in cos^ 6* = 1 — sin^ 6* = 1 — e ^'^^ ^\ we get 
that is equal to 

n (Palert + Pavg + N + 2 ^P^lertPavg (l - e-2(C-i?)) _ . 

Similarly, the techniques from Lemma |6] can be used to 
bound the angle. Let V„(x(0), 0, V^) denote the cone centered 
on the red alert codeword with axis running towards the origin 
and half-angle ip. For any S > and n large enough, if the 
half-angle ip is larger than 




then the cone contains the codeword for message w plus noise 
with high probability, i.e.. 



P(x(w) +z e V„(x(0),O,V)) > 1 -(5 . (74) 

Finally, these two bounds can be combined, as in the proof 
of Lemma |7] to get an an achievable red alert exponent of 




)i , , , , , , 1 

0.2 0.4 0.6 0.8 1 1.2 1.4 



Rate (nats/channel use) 

Fig. 10. Comparison of the red alert exponent attained by an (optimal) offset 
code construction and a conical code construction with an average power 
constraint of 0, 5, and lOdB with Paicrt = 2Pavg. 

In Figure \W\ we have plotted this red alert exponent 
alongside the optimal one derived via the offset construction 
for average power constaints Pavg = 0, 5, and lOdB with 

Paleit — 2 Pavg. 

-Note that this is an improvement over the error exponent reported in 
Theorem 1 of [211 since we have used tighter upper bounds. Specifically, 
in 121], we did not completely take advantage of the fact that both the noise 
and the standard codewords are nearly orthogonal to any fixed vector and to 
each other with high probability. 
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